30

Quantization of Neural Networks

!

!

 





 

 

 

 

   

  









 

      

  

  



   

  

 



 



 

 

  

  

 

 

 

 











       



  

  

  

  

  

  

  

  

 

 



 

















   







FIGURE 2.10

Overview of the proposed Q-DETR framework. We introduce the distribution rectification

distillation method (DRD) to refine the performance of Q-DETR. From left to right, we

respectively show the detailed decoder architecture of Q-DETR and the learning framework

of Q-DETR. The Q-Backbone, Q-Encoder, and Q-Decoder denote quantized architectures,

respectively.

inaccurate object localization. Therefore, a more generic method for DETR quantization is

necessary.

To tackle the issue above, we propose an efficient low-bit quantized DETR (Q-

DETR) [257] by rectifying the query information of the quantized DETR as that of the

real-valued counterpart. Figure 2.10 provides an overview of our Q-DETR, mainly accom-

plished by a distribution rectification knowledge distillation method (DRD). We find ineffec-

tive knowledge transferring from the real-valued teacher to the quantized student primarily

because of the information gap and distortion. Therefore, we formulate our DRD as a bi-level

optimization framework established on the information bottleneck principle (IB). Generally,

it includes an inner-level optimization to maximize the self-information entropy of student

queries and an upper-level optimization to minimize the conditional information entropy

between student and teacher queries. At the inner level, we conduct a distribution alignment

for the query guided by its Gaussian-alike distribution, as shown in Fig. 2.8, leading to an

explicit state in compliance with its maximum information entropy in the forward propaga-

tion. At the upper level, we introduce a new foreground-aware query matching that filters

out low-qualified student queries for exact one-to-one query matching between student and

teacher, providing valuable knowledge gradients to push minimum conditional information

entropy in the backward propagation.

2.4.1

Quantized DETR Baseline

We first construct a baseline to study the low-bit DETR since no relevant work has been

proposed. To this end, we follow LSQ+ [13] to introduce a general framework of asymmetric

activation quantization and symmetric weight quantization:

xq =clip{(xz)

αx

, Qx

n, Qx

p}⌉, wq =clip{ w

αw , Qw

n , Qw

p }⌉,

Qa(x) = αxxq + z,

Qw(x) = αwwq,

(2.24)

where clip{y, r1, r2} clips the input y with value bounds r1 and r2; theyrounds y to

its nearest integer; thedenotes the channel-wise multiplication. And Qx

n =2a1, Qx

p =